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\f} • We consider a quantum state shared between many distant locations, and define a quantum in- 

formation processing primitive, state merging, that optimally merges the state into one location. As 
announced in [Horodecki, Oppenheim, Winter, Nature 436, 673 (2005)], the optimal entanglement 
£N| ■ cost of this task is the conditional entropy if classical communication is free. Since this quantity 

can be negative, and the state merging rate measures partial quantum information, we find that 
, quantum information can be negative. The classical communication rate also has a minimum rate: 

a certain quantum mutual information. State merging enabled one to solve a number of open prob- 
lems: distributed quantum data compression, quantum coding with side information at the decoder 
^\ | and sender, multi-party entanglement of assistance, and the capacity of the quantum múltiple ac- 

CNj . cess channel. It also provides an operational proof of strong subadditivity. Here, we give precise 

definitions and prové these results rigorously. 



> 

■ I. INTRODUCTION 

' The field of quantum information theory is still in its infancy, with many of the key building blocks of the theory 
t-H . not yet in place or not well understood. This is perhaps not surprising, since the important elements of classical 
information theory have only been in place since the 70's. The notion of classical information was first introduced 
by Shannon [1] who defined it operationally, as the minimum number of bits needed to communicate the message 

■ produced by a statistical source. This gave meaning to the entropy H(X) of the source producing a random variable 
<*S I X . The amount of information that two random variables X and Y have in common was given a meaning through 
i 1 „ . the mutual information I(X : Y) = H(X) + H(Y) — H(XY). Operationally it is the rate of communication possible 

through a noisy channel taking X to Y. The fundamental Shannon theorems treated two bàsic qüestions: how many 
bits does one need to transmit a message from a source? How many bits can one send via a noisy channel? 

Another bàsic brick in classical information theory, which is a generalization of the noiseless coding problem, is the 
notion of partial information. The question is now, how many bits does the sender (Alice) need to send to transmit 
a message from the source, provided the receiver (Bob) already has some prior information about the source. The 
amount of bits we call the partial information. Slepian and Wolf showed that partial information is equal to the 
entropy of the source reduced by the mutual information [2]. This quantity is equal to what is called conditional 
entropy H(X\Y) = H(XY) — H(Y). It is actually an entropy, and was originally defincd as the average entropy of 
conditional probability distributions: 

H{X\Y) = -Y,Mv)Px\YÍx\y) \og Px]Y {x\y), (1) 

xy 

with Px\y(%\d) the probability of the source producing symbol x conditioned on the fact that Bob has y, and py(y) 
the probability that y is produced at Bob's site. 

This discovery of Slepian and Wolf clarified the picture of correlated sources: mutual information is the knowledge 
common to both Alice and Bob. Entropy of Alice's source is its full information content. The difference between 
the two is the information that Bobs needs to complete his prior knowledge about Alice's source (Figure I). It thus 
provided an information theoretic basis for the conditional entropy. It should be noted that it is a highly non-trivial 
operation, since Alice is able to communicate to Bob the full information about her string X\ . . . X n , even though she 
is unaware of what string Y\ . . . Y n Bob has. 

The quantities and operational meaning of the entropy, mutual information, and conditional entropy thus form the 
bàsic building blocks of classical information theory. We are interested in finding the corresponding bàsic elements in 
quantum information theory. The first step was done by Schumacher [3] , who showed that the von Neumann entropy 
plays an analogous role to Shannon entropy: it has the operational interpretation of the number of qubits needed to 
transmit quantum states emitted by a statistical source. 
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concept 


quantity 


operational meaning 


information 


H(X) 


The rate at which a source can convey mes- 
sages (Shannon compression) 


mutual information 


I(X : Y) 


For an input X which produces Y after being 
sent down a channel, I(X : Y) is the rate at 
which information can be sent reliably (chan- 
nel coding) 


partial information 


H(X\Y) 


The rate at which messages X can be sent to 
a party who has prior information Y (Slepian- 
Wolf theorem) 



TABLE I: Key concepts in classical information theory 



The next step was to find an analogue of the noisy coding theorem. Here it turned out that the analogy was not 
very strict: the quantum analogue of mutual information cannot be obtained by replacing Shannon entropies with 
von Neumann ones. It was found that the capacity of the quantum channel is determincd by a different quantity - 
the coherent information [4, 5]. The coherent information, defined for a bipartite state pab is 

I(A)B) = S{B) - S{AB), (2) 

and the channel capacity is obtained [6-8] by maximising it over input states pa- Here, S(B) and S(AB) are the von 
Neumann entropy of states ps = Tr aPab and pab, and we adopt the notation of dropping the explicit dependenec 
on p when such dependence is obvious. 

With the coherent information, there was a persistent mystery - for any particular input pa, the quantity S(A) — 
S(AB) could be negative, and it was not known how to intèrpret such a quantity, as it indicated a sense in which 
the channel capacity could be negative for such input distributions. Thus it is often the case that for a particular 
channel, no inputs will give positive distributions, and one should set the coherent information to zero, by inputting 
the null distribution (any pure state). 

Turning next to a quantum analogue of prior and partial information, there had previously not been any such notion 
a quantum scenario like that of Slepian-Wolf appeared intractable [9] . Another serious obstacle in the quantum world 
is that there are no conditional probabilities, hence conditional entropy cannot be defined. Conditional probabilitics 
only exist after one performs a measurement which of course destroys the state. One may try to overcome this 
difficulty, by naively replacing Shannon entropies with von Neumann one in the formula for conditional entropy, so 
that quantum conditional entropy would be the difference between the total entropy and the entropy of subsystem. 

S(A\B) = S(AB) - S(B) (3) 

Such an approach has been strongly advocated [10], however while this H goes to S rule works for defining information, 
it doesn't work for channel capacity, as mentioned above. It is thus not clear that it is the correct thing to do. However 
there is more serious obstacle here: the conditional entropy defined by taking H to S can be negative [10-12]. In [12] 
this problem was connected with quantum entanglement. Likcwisc for maximally entangled states, it was conncctcd 
with the ability to perform teleportation [10]. It had already been noted by Schròdingcr, that entangled state may 
possess a weird feature: if a system is in such a state we may know more about the whole system than about 
subsystems. In [12], Schròdinger's intuition was quantified by von Neumann entropies, and it was found that the 
entropy of subsystem can be greater than the entropy of the total system only when the state is entangled. It was 
however also found that there are entangled states that do not exhibit this weird property. Thus there was a question: 
what does it mean, that for some states we have such behaviour, and not for other states? 

It doesn't help that —S(A\B) is nothing but the coherent information, that determines channel capacity [6-8]! How 
can the duality between channel coding and Slepian-Wolf compression be conserved in any quantum analog? 

In our recent paper [13], we approached the problem of quantifying partial and prior information from a purely 
operational point of view. Inspired by the classical Slepian-Wolf theorem, we consider the scenario in which an 
unknown quantum state is distributed over two systems. We determined how much quantum communication is 
needed to transfer the full state to one system. This communication measures the partial information one system 
needs conditioned on its prior information. We found that the partial information is given by the conditional entropy, 
just as in the classical case. However, in the classical case, partial information must always be positive, while 
in the quantum world we find this physical quantity can be negative. If the partial information is positive, its 
sender needs to communicate this number of quantum bits to the receiver to achieve state transfer; if it is negative, 
the state can be transferred, and in addition, the sender and receiver gain the corresponding potential for future 
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H(XY) 




FIG. 1: A graphical representation of the building blocks of classical information theory. The total information of the source 
producing pairs of random variables X, Y is H(XY), while the information contained in just variable X (Y) is H(X) (H(Y)). 
The information common to both variables is the mutual information I(X : Y), while the partial informations are H(X\Y) and 
H(Y\X). In the quantum case, the quantum mutual information I(A : B) can be greater than the total information S(AB), 
which can be also greater than the local informations S(A) and S(B). To compensate, the partial informations S(A\B) and 
S(B\A) can be negative. 



concept 


quantity 


operational meaning 


quantum information 


S(A) 


The rate at which a source can convey quan- 
tum statcs (Schumacher compression) 


coherent information 


I(A)B) 


For an input which produces pab after being 
sent down a channel, I(A)B) is the rate at 
which quantum information can be sent reli- 
ably down the channel (quantum channel cod- 
ing). Merging allows us to intèrpret the 
negative vàlues of this quantity 


partial quantum information 


S(A\B) 


The rate at which quantum states with 
density matrix pA can be sent to a party 
who has prior quantum information ps 



TABLE II: Key concepts in quantum information theory with additions due to merging highlighted in bold 

quantum communication. This potcntial comrmmication is in the form of pure entangled states which can be used to 
teleport quantum states. Thus viewing entanglcment as a potential for quantum communication, we see that when 
the conditional entropy is positive, entanglement needs to be consumed, while when it is negative, entanglement is 
gained. 

One can view it in another way - the entropy S(B) quantifies how much Bob knows (in the sense of possessing 
the state), while the entropy S(AB) quantifies how much there is to know. Since quantum distributions can have 
S(AB) < S(B), there is a sense in which Bob knows too much. If Alicc wcrc to send her full state to him, at a cost 
of S(A), then he ends up having entropy S(AB ) - in the quantum world, after you receive negative information, you 
know less. 

The primitive which (optimally) transfers partial information we call quantum state merging, as Alice's state is 
effectively merged with Bob's state, arriving at his site. With this primitive in hand, one can gain a systematic 
understanding of quantum network theory, including scvcral important applications such as distributed compression, 
múltiple access channels and assisted entanglement distillation (localizable entanglement), and compression with 
quantum side information. 

The purpose of the current paper is to provide full proofs for the result of [13]. In Section II we formally define the 
notion of quantum state merging, and state the main result. In Section III we exhibit a general condition to ensure 
state merging and derive a one-shot protocol based on random measurements. In Section IV we prové the main 
thcorem, show that our protocol has the optimal classical communication rate, and provide a heuristic explanation of 
why the conditional entropy comes into play. 

Once the primitive of state merging has been put on a firm footing, we are able to use it to solve a number of 
previously intractable problems. A broad outline of these applications was given in [13], and here we provide more 
dctails. In Section V we look at the problcm of distributed compression, where several parties at different sites 
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individually compress a source, which is then decoded by a single party. It is found that the parties can comprcss 
at the ideal ratc of the total cntropy even though they are distributed. In Section VI, we look at noiseless coding 
with side information, i.e. we consider the problem where one party (Alice) wishes to compress her state to send to 
a decoder, and a second party (Bob) who holds part of the total state can aid her by sending part of his state. The 
decoder only wishes to decode the state of Alice, while Bob's state is only used to hclp in the decoding. As a corollary 
we find that if there is a single encoder Alice who has access to side information, then this can help her in sending 
information to a decoder, a situation impossible in the classical case. 

Next, in Section VII, we trcat entanglcment of assistancc [14] in the case of many helping parties (a concept similar 
to localizable entanglcment [15]). A pure state is shared by many parties, and the goal is to distill the maximum 
amount of entanglcment bctwccn two of the parties. The other parties can aid in this distillation through local 
operations and classical communication. Wc find that state merging gives the optimal rate of distillation. 

We then consider the quantum múltiple access channel, in Section VIII. Two parties, Alice and Bob wish to send 
quantum states to a decoder through a channel which acts on both their states. We find optimal rates using state 
merging, and derive the full rate region. We are also able to provide an interpretation to the longstanding puzzle of 
negative coherent information in the formula for the capacity of the quantum channel. Namely, if one party's rate is 
negative, than this is the amount of entanglcment he or she must invest in order to help the other party achieve the 
maximum rate. 

Before concluding in Section X, we provide a quick and intuitive proof of strong subadditivity using state merging 
in Section IX. 



Consider a source emitting a sequence of unknown bipartite pure states \ipi) ab\^2) ab\^) ab ■ ■ ■ from a distribution, 
with average density matrix pab- As with Schumacher compression, we assume the density matrix of the source is 
known to the two parties Alice and Bob, but they don't know the ensemble which realises it. I.e., for any given 
state they possess, the state is unknown, although the statistics of the source are. We are interested in information 
thcorctic quantitics, and in particular, we are interested in quantifying quantum information. We thus allow free 
classical communication between the two parties, and consider many copies n of the state pab- We now ask how 
much quantum communication is needed for Alice to transfer the unknown sequence of states \tp\) ab\^2) ab\Í>3) ab ■ ■ ■ 
to Bob's site. This we call quantum state merging. Notice that because classical communication is free, we can replace 
quantum communication by entanglcment duc to teleportation [16] - this will be a more convenient way of accounting 
for the quantum resources. Faithful state merging means that the fidelity of the sequence of states is kept for any 
realisation of the density matrix. 

There is an equivalent, yet more elegant way to conceive of this problem. Wc imaginc that the state pab is part of 
a larger pure state ipABR = \^){^\abr, with a state vector \iP}abr which also lives on a reference (or environment) 
system R. Faithful state transfer means that the transferred state has high fidelity with the original state Wabr- 
More formally, we definc: 

Definition 1 (State merging) Consider a pure state \^)àbr shared between two parties A, B and a reference R. 
Let Alice and Bob have further registers Aq, A\ and _B , B\, respectively. We call ajoint operation M : AA n ®BB — ► 
Ai ® B\B'B state merging of ^ with error e, if it is LOCC and, with Pa 1 b 1 b , br = ® m ü)(^ ' àbr ® (®k)a b ) , 



with maximally entangled states <&k, on AqBq, A\Bi of Schmidt rank K, L, respectively. Here, B' is a local 
ancilla of Bob's of the same size as A. The number \ogK — logL is called the entanglcment cost of the protocol. 

In the case of many copies of the same state, ^ — -0®", we call -(log-ftT — logL) the entanglcment rate of the 
protocol. A real number R is called an achievable rate if there exist, for n — > oo, merging protocols of rate approaching 
R and error approaching 0. The smallest achievable rate is the merging cost of ip. 

The main purpose of this paper is to prové in detail the result announced in [13], namely that the merging cost is 
equal to the conditional entropy of the state pab shared by Alice and Bob, S(A\B) = S(B) — S(AB). 

Theorem 2 (Quantum State Merging) For a state pab shared by Alice and Bob, the entanglement cost of merg- 
ing is equal to the quantum conditional entropy S{A\B) = S(B) — S(AB), in the following sense. When the S(A\B) 
is positive, then merging is possible if and only if R > S(A\B) ebits per input copy are provided. When S(A\B) 
is negative, then merging is possible by local operations and classical communication, and moreover R < — S(A\B) 
maximally entangled states are obtained per input copy. 



II. STATE MERGING: CONCEPT, DEFINITIONS AND MAIN RESULT 




(4) 
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Our strategy of proof will be the following. We first show that if the quantity is negative, then merging can be 
done by LOCC (indeed, with only one-way communication from Alice to Bob), and the entanglcmcnt rate that can 
be obtained is equal to minus the conditional entropy. Using this we will show that in the case of positive conditional 
entropy it is enough to spend S(A\B) ebits of entanglement. 

Finally, we will show that the rates given by the conditional entropy are optimal. We will also show that the 
classical communication cost is equal to the quantum mutual information between Alice and the reference system R, 

I(A : R) = S(A) + S(R) - S(AR) (5) 

and prové its optimality. 



III. ONE-SHOT STATE MERGING 



In this section, we first formulate a general sufficient condition on a measurement of Alice that ensures that Bob can 
complete state merging by local operations; then we show how random measurements succeed with high probability 
in realising this condition. 



A. Condition for merging with one-way LOCC 

Here we will provide a condition that is sufficient to obtain state merging with only LOCC. We formulate it in the 
one-shot setting of definition 1 . It is based on Alice performing a measurement which takes the original state 'f Xbr 
to another pure state, with the essential features that: (1) the reference system R is unchanged, and (2) Alice's and 
the Reference's states are in product form. Since all purifications are equal up to a local unitàries, this implies that 
Bob can perform a local unitary which transforms his state into p AB - 

More formally, we consider a protocol, whose bàsic constituent is Alice's incomplete measurement given by Kraus 
operators Pj mapping A to A\ (in our actual solution, it will be a von Neumann measurement followcd by a unitary). 
Given the outcome was j, the state $ 'jbs collapses to a state which we will denote by & 3 ~~, 

i\d tx A.\BIi. 

\^) Ai ~bh = ^^®I~b~ R )\^)a~b^ (6) 
where pj is probability of obtaining outcome j, 

Pj = m(p}p j ® j gs )i*>. (?) 

Suppose for the moment that \^ J ) AlBR has the property 

P í A 1 R = TA i® p R> ( 8 ) 
where çp ~ is the reduccd density matrix of ^ ~~, t a . is the maximally mixed state of dimension L on Alice's 

r Al R J Al BR' 711 J 

system A\, and p^ is the reduced density matrix of the original state ^àbr- Then (see [17]) there exists an isometry 
Uj : B — ► B\B'B on Bob's side, such that 

^MR ® U lW) Al BR = \*l)a iBi ® |*>5/Bfl, (9) 

where Y&) B i BR is the original state \^) ABR with the system B' substituted for A. This is because * is the purification 
of and that of t Ai , so both ~ ~ and (&l)a 1 b 1 <S> ^ b > B r are purifications of ta x ® p R - Hence, by Uhlmann's 
theorem, they are related by a unitary on Bob's system. 

Since we requirc fidelity approaching 1 only in the asymptotic limit, we obtain the following merging condition: 

Proposition 3 (Merging condition) Consider Alice's measurement with outeomes j, which oceur with probability 
Pj. Denote the state after the measurement result j was obtained by AlBR , and its reduced density matrix by 

f? A ~. The following condition implies the existence of a merging protocol with entanglement cost — \ogL and error 

that the so-called quantum error Q e satisfies 

i 

where t Ai is the maximally mixed state of dimension L on A\ . 
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Proof. The proof is based on the above considerations concerning the ideal situation. Using the relation eq. (A4) 
between the trace distance and the fidelity, we get 

3 

Thcn, by Uhlmann's theorem [18, 19] there exist isometries Uj of Bob such that 

H^ AlÈ ^A 1 ^p n )=F{(I AiR ^U j )\^) Aiè ^ L ) AlBl ^\^) è , Èk ), (12) 



hence 



3 

: ( -1 — -r ) > 1 — e. 



So, with the output state of the protocol, 

i 

we obtain 

F (Pa iBi à>bw (®l)a iBi ® *g, 5fi ) > 1 - e. (15) 
And using the relation (A4) between fidelity and trace distance once more, we arrive at 

\\Pa iBi A> B r - (®l)a iBi ® *b,bh||i < 2 Ve> ( 16 ) 
which concludes the proof. □ 

Note that for any protocol which achieves merging, the condition (10) must necessarily be met at some stage of the 
protocol. This is because in order for the final state to be close to the original state, must necessarily be virtually 
unchanged, and in order for the state to be at Bob's site, Alice's state must necessarily be in a product state with 
the reference system R. 

B. One-shot merging by random measurement 

Hcrc we will prové an abstract, one-shot version of the main theorem, showing that a random orthogonal measure- 
ment of rank-L projectors (and a little remainder) achieves merging. 

Proposition 4 (One-shot merging) Let \E" be a pure state, with local dimensions d A , d B , d^, and Trp~ < jj. 
Then there exists a POVM consisting of N = 



%J projectors of rank L and one of rank L' = dj — NL < L such thai 



Q< <2\/L§ + 2A, (17) 



and there is a merging protocol with error at most 2y 2y L^j + 2j^. 

In fact, by choosing the measurement at random according to the Haar measure on A, the expectation of the left 
hand side of eq. (17) is upper bounded by the right hand side. 
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Remark 5 Let us explain here briefly how we will use the Lemma in the proof of Theorem 2 in the case of negative 
S(A\B). Namely, we will apply this Lemma with the following parameters: d R w 2 nSn = 2 nSAB , d A w 2 nSa = 2 nS( - A \ 

D w 2 nS{R ï = 2 nS( * B \ where n is the number of copies of initial state ipABR shared by Alice, Bob and reference system. 
Moreover L will be related to the rate r of singlets obtained between Alice and Bob in the process of merging: L w 2 nr . 
Then the expression for quantum error will be 



Q e rí 2^ n{ - s{ - AB) ^ siB)+r) + 2 n( - r - s( . A ïï +1 
Thus if only r < S(AB) — S(B), then the quantum error will decay exponentially with n. 



(18) 



The crucial technical result in the proof of Proposition 4 will be the following statement about random (Haar 
distributed) rank-L projectors: 

Lemma 6 Let P : A — ► A\ be a random partial isometry of rank L, i.e. P^P is a projection onto a L-dimensional 
subspace of A. For example, one might put P — PqII with some fixed rank L-projector Pq onto a subspace A\ of A, 
and a Haar distributed unitary U on A. For the subnormalized density matrix 



J A X R 



{P ® I R )p ÀR {P ® I R )\ 



observe that its average over unitàries U is 



Pr- 



And we have: 



A\R 



L 

d À 

L 



-TA, ® P R 



T Al ® p R 



< 



d%D' 

A 



< 



L 



L—, 
D 



(19) 
(20) 



Proof. In Appendix A we recali the bàsic properties of the trace norm || • ||i and the Hilbert-Schmidt norm j| • j | ^ . From 
there (Lemma 13) we take that ||A||i < v^H-X'lb for an operator on a íf-dimensional space. This, and the concavity 
of the square root function, show that eq. (19) implics cq. (20). 

To prové eq. (19), we use the fact that it has the form of a variance, so 



UJ A l R 



L 



T Al ® P R 



AiR 
,2 



J A x Ri 



= ( Tr <À)- Tr K 1 «r 



(21) 



(^<r) 



d\L 

A 



To evaluate the average of Trw 2 ~, we use the well-known equation 

A\ ix 



Tr(( 



ui 



AiR 



i LO 



^{Fma^F^)), 



(22) 



where we have introduced copies of all systems involved, and with the swap (or flip) operator F cxchanging the two 
systems. (Note that F AR AR 



^aa ® ^ RR -) With this, and w.l.o.g. assuming that A\ is a subspace of A, 



( Ttuj Ai r) = \ Tr (Kr ® " AiÈ )(F AlAl ® F RR )) 

= ( Tr ((UU AA ® I R r)(P A r ® Par)(UU àà ® L~ R ~ R )\F AlAl ® F RR ) 
= ^ {{p AR ® P AR )((UU ÀÀ ® I m )\F AlAl ® F RR )(UU ÀÀ ® I RR ))) 
= Tr {(P AR ® P~arMUU~ a ~Jf mm (UU àà )) ® F RR ) 



(23) 
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where we have used the shorthand UU AA 



U À ® U À . 



In Appcndix B wc dcmonstrate, how using elementary 
arguments from the representation theory of U ® U, one can calculate that 



((UU AA )ÏF AlAl (UU ÀÀ )) 
Inserting this into eq. (23) gives 



L d ~A ~ L r L Ld A 

d~ A d\-l AA d~ A d\- 



- 1 

p~~ 

l AA' 



(24) 



L d À 



d À d\ 



L , 

"Y Tr p r 



L Ldj 



AR 



L , L 2 1 



(25) 



4 



A 



□ 



and looking at eq. (21) we are done. 

Proof of Proposition 4- Fix a random measurement according to the description of the Proposition. One way of 
doing this is picking N fixed orthogonal subspaces of dimension L, and one of dimension L' = d A — NL < L. The 
projectors onto these subspaces followed by a fixed unitary mapping it to A\ we denote by Qj, j — 0, . . . , N. Then 
put Pj := QjU with a Haar distributed random unitary U on A. 



Then, by Lemma 6, with ur* ~ — (Pj <g> Ir)Pàr^ó ® Ir)^ > 




>Pïl 



< N- 



dh 



L- 



D 



< 



4- 



(26) 



This is almost what we want, except that we haven't taken into account the normalisation: with pj 
p'a r~ pj^a r' we neec ^ to argue that on average, the pj are close to ■j Z . Indeed, eq. (26) implies 



Tro/ ~ and 

AtR 



L 



< 



4, 



(27) 



hence we obtain 



N 



Finally, it is clear that (Tr(p^^P )} 



_ l' 



< 2\ L 



D 



(28) 



< 



-, and since the trace distance of two states is at most 2, we get the 



result as advertised, because the quantum error is composed of the probability of hitting P and the sum of the error 
terms of the Pj, weighted by their probabilities. Now we can apply Proposition 3. □ 

So, if d R <C D there is a merging LOCC protocol with small error and entanglement cost up to logcí^ — \ogD (i.c., 
the negative of this is the amount of entanglement produced). If d^ -jt. D, consider the state <P àbr ® (&k)a b with 
a maximally entangled state of Schmidt rank K 3> d^/D. Now merging is possible (with L = 1); the entanglement 
cost is logK, and it can be made as small as logcí^ — log_D. 



IV. PROOF OF THE MAIN THEOREM 
A. Achievability of merging 

Proof of Theorem 2. We will first prové the direct part saying that the rates are achievable. Consider n copies of the 
state \iP)abr, and assume first that S(A\B) < 0. 

We would like to use our one-shot version, Proposition 4, but cannot do so dircctly, since the dimension d R and the 
number (Tr p R ) n are not information theoretically meaningful. 

Instead, we consider the vector l^)^^^ and state \^)àbw w ^ 

\^)~a~ br ■= (n* ® n 5 ® n ü mr BR , \*) ÀèR ■= tt^^àbw W 
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where A, B and R are the typical subspaces of A n , B n and R n , respectively, and 11^, etc. are the projection operators 
onto these typical subspaces. In Appcndix C we cxplain what is necessary to know about typicality, in particular we 
have: 

(fi|ft) = (vr n (n^ ®n g ® n fi )|^>®" > i - e, (30) 

for any e > and large enough n. Indccd, we can choose e = 3exp(— cS 2 n) with some constant c, where S > is a 
typicality parameter; namely from eq. (C5) in Appendix C we have Tr p^ n ITj, Tr pf™rig, Tr p% n IÍ^ > l-cxp(-c<5 2 n). 
We obtain the bound (30) from observing 

Ja» ® j b » ® /i?- - ® n 5 ® n s < (j A „ - n^) ® (i B » - n 5 ) ® (/«» - n fi ). (31) 

Furthermore, with fi = we have (using eqs. (C10), (C9) and (C6) in Appendix C) 

rankfijj > (1 - e )2»[S(A)-«] > 

rankfi^ < 2 n [ fi, < fl > +í ], (32) 

% < n é p| n n é < 2-™[ s ( B )- í ]n é . 

Hcnce we get, for the normalized ^ Xbw 

d À >(í-e)2 n ^- 5 \ d ft <2 n W R ï+ s \ D > (1 - e) 2 T^- s y (33) 

By the gentle measurement Lemma 15 (see Appendix A), we obtain from eq. (30) 

llVCfi - Afilli < 2 ^ hcnce \\^% R - < Vi. (34) 

Now Alice and Bob follow a merging protocol as if they had ^àbw ano - w ^b L = 2 n [ s, (- B )~· s '(- R )~ 3t5 ] . If the state were 
actually Ü" Xbw ^ ne quantum error would be 

Q e < 2 \ÍÏM+2^- < ^_2- ní / 2 + 2 1 - 2 " 5 . (35) 
VI? 1 — e 

(Observo that 5(5) — < S(A) by subadditivity.) So, by Proposition 3 we would get a merging protocol with 

error 0(2~ nS / 4 ). By eq. (34), running the same protocol on iPabr,' we obtain an error of 0(2~ nS / A ) +0(2- cn52 / 2 ), 
which vanishcs cxponentially as n — > oo. Sincc S > was arbitrary, the direct part follows. 

It remains to consider the case when S(A\B) is non-negative. Here, Alice and Bob share additionally n(S(A\B) + A) 
maximally cntangled states. Each ebit contributes conditional entropy —1, so that the final state has negative 
conditional entropy —n/S.. Then however merging can be done by LOCC, as we have proven above. 

Remark 7 Note that despite the generality of the definition of merging, our protocol is much morè special. The 
definition allows to start end end with certain amounts of ebits, but the amount charged is only the difference, so that it 
would be conceivable that to achieve the conditional entropy some catalytic use of entanglement is necessary. However, 
our protocol either needs no initial entanglement and outputs some (if S{A\B) < 0) or produces no entanglement but 
needs some initially (if S(A\B) > 0). 



B. Merging is optimal 

Let us now turn to the converse part. The essence of the proof is that entanglement cannot increase under local 
operations and classical communication and transmission of n qubits more than by n [20] . We will consider preservation 
of Bob's entanglement with Alice and the Reference. The initial entanglement Ei n includes the entanglement of the 
shared state plus any initial resource of purc entanglement \ogK . Initially, it is nS(B) + \ogK as the initial state was 
just ipABR- The final entanglement E out includes the entanglement of the final state plus the final resource, logL bits 
of pure state entanglement, and is 

E out w nS(AB) +nlogL. (36) 
Since Alice and Bob used only LOCC operations, we have 



Eout lli Ei n 



(37) 
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as entanglement could only decreasc, giving R = logK — logL < S(AB) — S(B). 

In more detail, assumc L < 2°(") for technical reasons. The LOCC protocol (which is also LOCC between Bob 
and Alice+Reference) can be thought of as generating an ensemble { l fA 1 B 1 B' n B n R n i ( lk} of pure states. Monotonicity 
of the entropy of entanglement under LOCC [21] mcans 

nS{B) + \ogK>Y,<lkS{y k BlB , nBn ). (38) 
k 

The condition (4) for successful merging translates into 

E QkF{<p k AlBlB ,n BnRn , {3>l)a iBi ® i>%? BR ) > 1 - e, (39) 

k 

thanks to the lincarity of the fidelity when one argument is pure. Using eq. (A4) in Appendix A this yiclds 

E^H^iSi-B-S'·ü" - (®l)a iBi ®iI>b?br\\i < 2 ^ ( 4 °) 
k 

hence by monotonicity of the trace norm under partial tracing, 

J2^h k B lB 'n B n-r Al ^pp B \\ 1 <2V~e- (41) 

k 

By Fannes' inequality (stated as Lemma 16 in Appendix A), this finally gives 

J2qk\S(<fi BlB ,n Bn ) -logL - nS(AB)\ < (logL + n\ogd A + n]ogd B )vpyfà < 0{n)r,{2V~e), (42) 

k 

using the concavity of the 77-function. With eq. (38), we thus get 

-(logK - logL) > S(A\B) - 0(l)r,(2^e), (43) 
n 

which results in the converse when n — > 00 and e — > 0. □ 



C. Classical communication cost of merging 

In our protocol for quantum state merging, the amount of classical communication that Alice needs to send Bob 
is given by the number of possible measurement outcomes: at most A D R + 1, which in the i. i. d. ca.se i^^abfi means a 
rate of S(A) + S{R) - S{B) = I(A : R). Note that this is true regardless of S(A\B) > or S(A\B) < 0. 

We now show that this amount of communication is needed, and thus our protocol is communication optimal. 

Theorem 8 For a state \iP)abr shared by Alice, Bob and the Reference, the classical communication cost of merging is 
equal to the quantum mutual information between Alice and the reference system R, I(A : R) = S(A) + S(R) — S(AR). 

Proof. We will first need to takc a short digression. Consider a protocol which achieves merging with a entanglement 
rate R q and classical communication at rate R c . Now let us imagine that the parties do not have access to a classical 
channel, so must send all their classical communication via the quantum channel, encoded into qubits. This gives a 
fully quantum version of merging [22] similar to the "mother protocol" (see [23] for an alternative, direct proof). If 
R q = S(A\B) and R c = I(A : R), we have, in the "sloppy" notation of [24], 

l -I{A :R)[q^q}> h(A : B)[qq] + (id A ^ B , : p AB ), (44) 

where the equation means that a rate of \l{A : R) uses of a noiseless qubit channel [q — > q], and it produces \l{A : B) 
bits of shared entanglement [qq] in addition to achieving state merging from Alice to Bob. The latter is represented 
by (id A ^B' '■ Pab), i-e. a identity channel from Alice to Bob working on the source pab- 

We briefly sketch how state merging gives the protocol of eq. (44). Our merging protocol is expressed in the resource 
inequality formalism as 



S(A\B)[qq] + I(A :R)[c^c}> (id A ^ B , : p AB ), 



(45) 
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where [c — > c] stands for the communication resource of 1 classical bit. Recali that for any state merging protocol, 
the classical communication must be completely decoupled from the sent state for \ip) abr to remain puré, and thus 
it can be recycled as R c bits of cntanglemcnt; the cntanglement can further be used to send quantum states. This is 
what the authors of [24, 25] call Rule I, where each bit of classical communication (dcnoted as [c — > c] ) can be madc 
coherent: we denote a coherent classical [26] bit by [q — > qq]. At the left hand side of an inequality like (45), Rule 
I says that it can be replaced by half a bit of a quantum channel on the left and half a bit of shared entanglcmcnt 
on the right hand side (denoted \[q — > q] — | [<?<?]). One sees this by sending the classical communication used in 
teleportation as coherent qubits which are then recycled into entanglement. Thus, 

[d^dQ\ = \[q^q]-\[qq\- (46) 

Applying Rule I of eq. (46) to eq. (45), and rearranging the terms gives the mother protocol in the formulation of 
eq. (44). 

We now show that the mother is an optimal protocol to achieve state merging in the case when one doesn't have 
access to a classical channel (see also [23]). We use the fact that a necessary condition for any state merging protocol 
is that Alice must completely decouple herself from the state \iP)abr- This is because the state needs to be shared 
by R and B by definition of state merging. 

Whatever Alice does, including measurements and processing, we may consider coherently, as an operation which 
takes pa and some ancillas, and produces a part which gets sent down the quantum channel, and a part pa' she 
retains. This results in a state \iP')bb'r which has high fidelity with \iP)abr, plus some entanglement between Alice 
and Bob. Now, using Standard quantum cryptographic rcasoning originating in [27], if \i/>')bb'r is (almost) pure, then 
the system A' must be virtually in a product state with B'BR. In particular, the mutual information between the 
state p' A and the reference system R must be close to zero. Each qubit sent can reduce Alice's mutual information 
with the reference system by at most 2, thus at a minimum, Alice must send \l{A : R) qubits down the quantum 
channel. This gives the optimality of Alice's use of the quantum channel in protocol (44). 

That at most \I(A : B) bits of entanglement are obtainable from the shared state, when sending \l{A : R) qubits, 
can be easily seen as follows. Observe that the \I(A : R)[q — > q] on the left hand side of eq. (44) can be replaced by 
\l(A : R)[qq]+I(A : R)[c — > c] due to teleportation. If the entanglement rate on the right were larger than \l{A : B), 
we could perform state merging with entanglement rate strictly smaller than \l{A : R) — \l{A : B) = S(A\B), 
contradicting the converse of Theorem 2. 

Now, to prové optimality of the classical communication in eq. (45), consider a hypothetical state merging protocol 

R q [qq] +R c [c^c]> (id A ^ B > ■ Pab) (47) 
which we may transform using Rule I [24, 25] into 

(R q - ^R^j [qq] + ^R c [q -» q] > (Íd A ^B> ■ Pab) ■ (48) 

Comparing this with the mother protocol (44), we have that R c > I(A : R) by virtue of the optimality of (44); 
R q > S{A\B) comes out again, as it should. □ 

Thus in addition to giving an operational interpretation for the quantum condition entropy, merging gives an 
operational interpretation for the quantum mutual information. Secondly, the measurement of Alice makes her state 
completely product with R, thus rcinforcing the interpretation of quantum mutual information as the minimum 
entropy production of any local decorrelating process [28, 29]. This same quantity is also equal to the amount of 
irreversibility of a eyelic process: Bob initially has a state, then gives Alice her share (communicating S(A) qubits), 
which is fmally merged back to him (communicating S(A\B) qubits). The total quantum communication of this cycle 
is I(A : R) quantum bits. 

Having concluded our proofs regarding state merging, we now turn to its applications. 

V. DISTRIBUTED COMPRESSION 

In usual Schumacher compression, a single party Alice, receives a state from a source, and must compress the states 
so that they can be faithfully decoded by another party. For a source emitting states with density matrix pa, this 
can be done at a rate given by the entropy S(A) of the source [3]. One can imagine the situation where the states are 
distributed over many parties, and have to be compressed individually. Each party then sends their compressed share 
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to a decoder who must be able to decode the full state. In the classical case, this problem was solved by Slepian and 
Wolf [2] who found that the total rate for distributed compression could equal the compression rate when the parties 
are not distributed. In the quantum case, previous results [9, 30] were interpreted as indications that one cannot 
compress at the same rate in the distributed vs. non-distributed case. However, using state merging, we will show 
that formally the same achievable rate region as in the Slepian- Wolf theorem is obtained 

In detail, we assume that the source emits states with average density matrix pA 1 A 2 ...A m , and distributes it over 
ra parties. The parties wish to compress their shares as much as possible so that the full state can be reconstructed 
by a single decoder. We allow classical side information for free (we will only need classical communication from 
each encoder to the joint decoder), and only ask about the rate Ri of entanglement between the i th encoder and the 
decoder. A tuple (R±, . . . , R m ) is achievable if there exists an (m + l)-party LOCC procedure taking in the source 
Paí A m ' purified to a state V'Iai A m > anc ^ + e ) ebits between Aj and the decoder B, such that the final state is 

PR"B[...B' m with 

F(p,i>® n )>l-e, (49) 

and e — > as n — > oo. As always, the reference is passive, and plays no role in the protocol. Note that the rates Ri 
can be negative here, just as in state merging, meaning that n(—Ri + e) ebits are returned by the protocol. 

Let us first describe the quantum solution for two parties and depict the rate region in Figure 2. If one party 




FIG. 2: The rate region for distributed compression by two parties with individual rates Ra and Rb- The total rate Rab is 
bounded by S(AB). The top left diagram shows the rate region of a source with positive conditional entropies; the top right 
and bottom left diagrams show the purely quantum case of sources where S(B\A) < or S(A\B) < 0. It is even possible that 
both S(B\A) and S(A\B) are negative, as shown in the bottom right diagram, but observe that the rate-sum S(AB) has to be 
positive. 

compresses at a rate S(B), then the other party can over-compress at a rate S(A\B), by merging her state with the 
state which will end up with the decoder. The only difference between this scenario and the state merging one, is that 
Bob first compresses his state, and sends it to the decoder, who then decompresses it; Alice then merges her state 
with Bob's state which is now at the decoder. This gives us one possible way for the two parties to jointly compress 
the states. Time-sharing gives the full rate region, since the bounds evidently cannot be improved. 

Analogously, for m parties Ai, and all subsets T Ç {Ai , Aa, • • ■ , A TO } holding a combined state with entropy S(T), 
the rate sums Rq- = ^A gr^A, clearly have to obey 

Rt > S(T\T) for all sets T, (50) 



with T = {Ai, A2, . . . , A m } \ T the complement of set T. This just follows from the converse to Theorem 2: even if 
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the decoder somehow has all the shares T, a total rate of at least S(T\T) is necessary to convey the remaining shares 
T. 

That this bound can be achieved simply follows from the fact that with T at the decoder, each party can in turn 
merge their state with what will be at the decoder. So, for example, with four parties, an obtainable rate point is 
obtained when party A\ sends her state at rate S(A\) just by regular Schumacher compression, party A 2 merges 
her state with the first parties state at the decoder with rate S(A2\Ai), party A$ merges at a rate S(A3\AiA 2 ), and 
party A4 at rate S(A4 \AiA 2 A 3 ), with rate total being the Schumacher rate S(AiA 2 A 3 A 4: ), etc. These rate tuples 
are however just the còrners of the region defined by eq. (50); hence time sharing between various combinations of 
ordering the encoders gives the full rate region. 



VI. QUANTUM SOURCE CODING WITH SIDE INFORMATION AT THE DECODER 

Related to distributed compression is the case where only Alice's state needs to arrive at the decoder, while Bob can 
send part of his state to the decoder (subject to a rate constraint) in order to help Alice lower her rate. The classical 
case of this problem was introduced by Wyner [31]. For the quantum case, we demand that the full state ipABR be 
preserved in the protocol, but do not place any restriction on what part of Bob's state may be at the decoder and 
what part can remain with him, while Alice's has to go to the decoder. 

To arrive at a formal definition, wc would likc to speak of two rates Ra and Rb here, of entanglement between 
Alice and the decoder C and of Bob and the decoder C. Starting with n copies of the source, A n B n R n — V'abíï' 
we may consider LOCC protocols between A, B and C, that take in this state and maximally entangled states 
of Schmidt rank Ka {Kb) between A and C (B and C). It is supposcd to producc a high-fidclity approximation 
of ^c ,rl C"B'R n tensored with maximally entangled states of Schmidt rank La {Lb) between A and C (B and C), 
where ^c ln C"B'R n is obtained from iPabr by substituting C" for A n and with an isometry (e.g. a unitary operation 
taking one system to two systems) B n — > C"B'. If in the limit of arbitrary block length the fidclity tends to 1 
and -^(logKA — logL^) — > Ra, ^(logA's — \ogLs) — > Rb, we call the rate pair (Ra,Rb) achievable, and the side 
information problem is to characterise the achievable pairs as concisely as possible. 

Using state merging we can see that for any isometry T : B — > U <£> V, the rates 

R A = S(A\U) and R B = E P (AU : R) - S(A\U) (51) 
are achievable, where iPauvr — {id A <g> T <g> íAr^abr, and 

E P (AU : R) = mmS((id A u ® K)pauv) (52) 

is the so-called entanglement of purification [32] of the state pau R with respect to the split AU -R. The minimum 
is taken over all channels A acting on V . The entanglement of purification is in some sense a measure of total 
correlations, as it can be interpreted as the amount of entanglement needed to create a state, if the only allowed 
operations is tracing out. 

The achievability of rates can be seen as follows: the channel A can be represented, with the help of an environment 
B' , as another isometry V — ► WB', so that iPauvr is mapped to ipAUWB' r- Now, with many copies, lct Bob send 
the system U to the decoder, at rate S(U), and Alice merge her state to the decoder, at rate Ra = S(A\U). Finally, 
with the decoder now having AU, let Bob merge W to him, which has rate S(W\AU), so that the total of Bob's rate 
is R B = S(U) + S(W\AU) = S(AUW) - S(A\U). The minimisation over W leads to the formula for the entanglement 
of purification. 

Here, the isometry T acts on many copies of B, and up to this "regularisation limit" , the rate pairs (51) are optimal 
for one-way protocols. To see why this is so, consider that at the end of the protocol, Bob will have sent part of his 
state to the decoder. This part, U, is obtained by some local isometry of Bob's: B n — ► UV. Likewise, Alice will 
have sent all her A n to the decoder. The total amount of entanglement used, u(Ra + Rb), cannot be less than the 
total entropy of what ends up at the receiver, which has entropy S(A n U), and this is lower bounded by Ep(A n U : R). 
By the converse of Theorem 2, Alice's entanglement cost, nR A , cannot be less than S{A n \U). Thus we have proved 
that the set of achievable pairs is given by 

00 1 

U {- (S{A n \U), E P (A n U : R n ) - S{A n \Uj) s.t. T : B n — ► UV isometry}. (53) 

71=1 

(Note that since the formula doesn't mention V, we may actually look at channels B n — > U '.) 

Because T acts on many copies of i?, it is unclcar whcthcr a singlc-letter formula for the achievable rate region can 
be obtained, potentially by finding a better - lower - expression for Bob's rate. Indeed, in the classical case, this is 
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what happens [31]. For classical random variables X and Y with Alice and Bob, respectively, thc singlo- lcttcrizcd 
rate for Bob is given by imagining a channcl Y — > W. Bob nccds to scnd only I(W : X) bits of W rather than H(W). 
Wl·iile the quantum protocol above is clearly optimal, it may be that the entanglcmcnt of purification is non-additive, 
and thus S(U) may be much lower than nS{U\) where pu x is the state obtained by acting a channel on single copies 
of ps- 

Source coding with side information at the encoder 

In the classical case, if a party aims to send her variable to the decoder, having herself access to some side information 
is of no additional value. If Alice wants to send classical variable X to Bob, she cannot lower her rate by sending 
or even knowing additional information. In the quantum world, this is not the case, as can be seen from the side 
information problem in the case of one party. Wc consider Alice, who has state p Al and is required to scnd it to 
Bob. This she can do using state merging at rate S{Ai\B). However, if she also has access to state pa 2 which may 
be entangled or correlated to p Al , then she may be able to do better. This better rate is obtained by sending part of 
Pa 2 as well - so in some cases, less is more! 

Applying an isometry T : A 2 — ► A 2 A 2 , and actually merging A\A' 2 , she can achieve a rate S{A\A' 2 \B). Hence one 
would naturally minimizc ovcr channcls T: 

R> min S (Ai A' 2 \B). (54) 

As argued in the side information problem, the right hand side is equal to Ep{AiB : R) — S(B). Essentially, due to 
the non-monotonicity of the von Neumann entropy, it can be beneficial to lower the entropy of what you are sending, 
by merging additional quantum states which are entangled with what you needed to send. 

VII. MULTIPARTITE ENTANGLEMENT OF ASSISTANCE 

In this section we consider the multipartite entanglement of assistance [14]. Sometimes it is called localizablc 
cntanglement [15], although wc operate in the regime of many copies and collective measuremcnts. Consider a pure 
m-partite state 4>A 1 .A 2 ....A m · Thc cntanglement of assistance is defined for two fixed nodes Ai and Aj, as the maximal 
pure entanglement that can be obtained between those nodes by LOCC operations performed by all the parties. Here 
is a more preciso dcfmition: 

Definition 9 For an m-partite pure state, consider a measurement performed by LOCC that leads to pure states 
between chosen nodes Ai and Aj for any outcome k of the measurement. Let the probability of the outcome k be pu, 
and the entropy of the node i (equal to entropy of the node j) be denoted by Sk{Ai). The entanglement of assistance 
between the nodes Ai and Aj is defined as 

E A (ip,A i :Aj)=sup^2p k S k {A i ) (55) 

k 

where supremum is taken over the above measurements. Asymptotic entanglement of assistance is given by regular- 
ization of the above quantity 

E%{i>,Ai : Aj) = hm -E A (^ n ,A t : Aj). (56) 

n— *oo n 

Asymptotic cntanglement of assistance was determined for pure states of up to four parties in [33] . Namely it was 
proven that for m < 4 the maximal amount of entanglement that can be distilled between Alice and Bob, with the 
help of the other m — 2 parties C\, . . . , C m _2, is given by the minimum entanglement across any bipartite cut of the 
system which separates Alice from Bob: 

E%(iJ>,A: B) = mm{S(AT),S{BT)} =: £ min -cut(^, A : B), (57) 

where the minimum is taken over all possible partitions of the other parties into a group T and its complement 
T = {C 1 ,...,C m _ 2 }\T. 

In [13] we generalized this result to an arbitrary number of parties, by use of the primitive of state merging. The 
result is clearly optimal - one cannot increase entanglement by LOCC. The entropy of any splitting T which divides A 
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from B is a measure of the entanglement of the total pure state between AT and BT and it cannot increase during thc 
protocol - in fact not by any protocol allowing arbitrary joint operations of the two groups AT and BT and classical 
communication. Thus all entropies under such splitting serve as an upper bound for the amount of entanglement 
which can be distilled between A and B. 

The protocol for achicving this optimal rate is as follows: each party in turn merges their state with the remaining 
parties on its side of the minimal cut, preserving thc minimum cut entanglement. The merging protocol we consider 
will be slightly different from the merging protocol considered previously in two respeets. As before, the party who 
wishes to merge his state with other parties performs a random measurement on their typical subspace. However, 
since the receiver will consist of many parties who are separated from one another the final decoding step (i.c. thc 
unitary which the receiver performs conditional on the measurement outeome of the sender) will not be performed until 
the very end. The second difference is that the senders will perform complete measurements, and will not attempt 
to distill additional entanglement bctwccn thcmsclvcs and the receiving parties. This will not effect thc merging 
condition, but it does mean that the maximally cntanglcd states which would be created between the merging parties 
and the receiver will be destroyed. This greatly simplifics the analysis, despite some entanglement being lost. We 
only consider entanglement of assistance - i.e. a protocol which attempts to distill entanglement between A and B. 
More complicated protocols can be constructed which also result in entanglement between other parties. 

Before moving to the protocol, we will need to prové an aspect of state merging already implicit in Theorem 2, 
which will serve as a cornerstone of (among other things) proving a formula for asymptotic entanglement of assistance: 
for a tripartite pure state V'abí? ^ S{R) < S(B), a random rank-1 measurement on the typical subspace A C A n 
produces states ilP B „ Rn such that most of their reduced states pjj„ are close to the state p^ n , the reduced state of the 
initial state V'abíï· 

Proposition 10 (Random measurement gives covering) LetipABR be a tripartite pure state with S(R) < S(B), 
of which we consider n copies, and consider the state "í ^br °f P roo f °f Theorem 2 ( Section IV) belonging to the 
typical subspaces ABR. Denote by p R the state of systera R. Let {\e)j} be a basis on A chosen at random according 
to the Haar measure, and be the state obtained on system R upon obtaining outeome j; let pj be the probability of 
this event. Then for any e > and all large enough n, we have 

E^II4-^lli)^ e ' ( 58 ) 

i ' 
where the average is taken over the choice of basis. 

Proof. This is just the special case of L = 1 in Proposition 4. □ 

With this tool in hand we can analyze the protocol outline above. Clearly, if va = 2, there is only one cut, and its 
entropy is S(A), the entropy of entanglement, and we are done. So, from now on m > 3. 

Assume for the moment that all S(AT) are distinct (we'll come back to this point at the end), and consider helper 
C m -2- For each set T, clearly S(AT) = S(BT), by the purity of the overall state. Hence, for the min-cut we can 
restrict to looking at the entropies S(AT) and S(BT), with C m _ 2 ^ T- For each such set T C {1, . . . ,m — 3}, 
consider the relative complement T' := {1, . . . , m — 3} \ T. This defines a tripartite system composcd of C m _2, AT 
and BT'. Let C m _2 perform a random measurement on his typical subspace C m _2, as in Proposition 10. We get (if 
only n is large enough), with arbitrarily high probability, states í , ^ BCi c m _ 3 w hich by eq. (58) satisfy: 

For all T : S(A n T n )^ = S(B n T' n )^ = n(mm{S(AT), S{BT')} ± í) , (59) 

with arbitrarily small S. In other words, for each such , 

^ m i„-cut A n : B n ) = n(£ min _ cut (V>, A:B)±S), (60) 

and that means that the min-cut entanglement is almost preserved (up to an arbitrarily small variation in the rate), 
and hence that the reduced state entropies can be assumed to be all distinct (by choosing 5 small enough). Now we 
recursively apply the same to C m ~3, ■ ■ ■ , C\. □ 
Finally, for thc assumption that all reduced state entropies are pairwise distinct: this can be enforced if the 
parties first "borrow" an arbitrarily small rate of entanglement to distribute singlets between chosen pairs. Then our 
distinetness assumption becomes true. In the limit, only a sublinear amount of entanglement is needed to do this, 
but on the other hand [34] shows that the asymptotic entanglement landscape of múltiple parties does not change 
if one allows this sublinear amount - this is due to them being able to always, perhaps inefficiently, extract some 
entanglement across any given cut unless across that cut they happen to be in a product state. 
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Remark 11 Note that a crucial part of the argument of why the minimum cut entropy doesn't change is the use of 
random codes. This is because C\ 's procedure is universal - it does not depend on the cut. He makes a measurement 
which only depends on the typical subspace of his state. The measurement thus serves to merge his state with whichever 
grouping of subsystems has the larger entropy compared with the remaining systems. Not all quantum codes have this 
feature - for example Devetak codes [8] depend both on the state of the sender, and that of the receiver. The same 
applies to [33], which is why there even the argument for m = 4 has to be quite subtle. 

It may seem odd that after performing a random measurement, ones state goes to any set of parties which has 
more entropy than the remaining parties. Since there are many possible groupings of the parties, for some groupings 
a certain party would hclp receive the state, but for other groupings, that party's state would be left unchanged by 
the random measurement. Of course, there is no contradiction, as in the end, at the decoding step, one has to decidc 
on the grouping, and with fidelity approaching 1 only for many copies of the state. 

Conjecture 12 It is awkward that in the recursive procedure described above for m parties we have to first consider 
a measurement on a long block of states, and then for the second measurement blocks of these blocks, etc. 

It seems likely that the simplest random measurement strategy will indeed also work: allm — 2 helpers C\, . . . , C m -2 
measure in a random basis of their respective typical subspaces and broadcast the result to Alice and Bob. They should 
then end up, with high probability, with a state of the min-cut entanglement. 

VIII. CAPACITY REGION FOR THE MÚLTIPLE ACCESS CHANNEL 

We consider a channel with two senders Alice and Bob, and one receiver Charlie; this is the múltiple access channel. 
For the classical múltiple access channel, any rates satisfying the following incqualities are achievable for encoding 
independent messages from Alice and from Bob at their respective terminals to Charlie who decodes them jointly: 

R A < I{A : C\B) 

Rb < I(B : C\A) (61) 
Ra + Rb< I{AB : C). 

The quantum múltiple access channel - where Alice and Bob want to send quantum information was considered 
in [35], and we refer to that paper for the definitions of codes and rate region. In [13], we found that one could use 
state merging to find a larger achievable region, including negative rates. Namely, that for the quantum múltiple 
access channel, there is the following region of achievable rates: 

Ra < I{A)C\B) := I(A)BC) 

Rb < I{B)C\A) := I(B)AC) (62) 
Ra + Rb< I(AB)C). 

The state on which the quantities are evaluated is constructed as follows. Consider two pure states ipAA' and ipBB'- 
Let pabc be the state, resulting from the halves A' and B' being sent down the channel: 

PABC = (IaB ® &A'B'->c)(\Í>){iI>\aA' ® IV'XV'lBS')- ( 63 ) 

In the classical theory, only positive rates make sense. In the quantum case, the rates can be meaningful, even if one 
of them is negative. For example, when Ra is negative, and Rb is positive, this means that when Alice invests Ra 
qubits, then Bob can send Rb qubits, as we shall see. 

A. Remarks on coherent information 

In [4] the coherent information was introduced and defined in terms of an input state pa and a channel producing 
output ps as 

I(A)B) = S(B) - S(AB), (64) 

that is, as the conditional entropy with a minus sign; this was puzzling because it can be negative. Since it gives the 
channel capacity of a quantum channel (by maximizing it over input distributions pa), it was unclear how to intèrpret 
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negative uses of a channel. We will see that the negative part will acquire operational meaning, in full accordance 
with the positive part. We also havc defined the conditional coherent information as 

I{A)B\C) = S(B\C) - S{AB\C). (65) 

We have the useful identity [consistent with eq. (62)] 

I{A 1 )B\A 2 )=I(A 1 )BA 2 ). (66) 

That is, conditioning the coherent information is very simple: just erase the bar. Then we have a chain rule of the 
same form as the one for mutual information, 

I(AiA 2 )B) = I(A 2 )B) + J(Ai)B|i4 2 ). (67) 

What seems surprising is that conditioning can only increase coherent information! However, this can be explained as 
follows. Namely, in classical information theory we have to have situations where conditioning decreases information, 
due to lack of monogamy. Indeed, we can have situation where 

I{X X : Y) + I(X 2 : Y) > I{X X X 2 : Y). (68) 

(E.g., the three variables could be fully correlated.) Therefore, to save the chain rule, conditioning must decrease 
mutual information. However in the quantum case we always have 

J(Ai)B) + I(A 2 )B) < I{A\A 2 )B), (69) 

due to strong subadditivity. Now conditioning very often increases coherent information, because we have equality in 
the chain rule identity (67). 

B. Direct coding theorem: achievability of rates 

To check that the rates satisfying the above conditions are achievable, it is enough to consider one còrner, for 
example 

R A = I(A)BC), R B = I(B)C), (70) 

which is an upper còrner of the rate region, see Figure 3. 

When both I(A)BC) and I(B)C) are negative, they are trivially achievable: Alice and Bob do nothing. So in this 
case negativity of rates does not appear meaningful, as zero is achievable too, and one always optimises rates over 
input states. When I(A)BC) is negative and I(B)C) is positive, again, those rates can be achieved by Alice doing 
nothing, and Bob - by Standard quantum coding theorem. So again the negative rate is not interesting. Thcre are 
therefore two situations, which we have to consider: 

I(A)BC) > and I(B)C) > 0, or (71) 
I(A)BC) > and I(B)C) < 0. (72) 

It is enough to consider the first one in detail, as the second one is its simple consequence. Let us first describe how 
to achieve those rates, when Bob and Alice can communicate quantum messages to C if classical side-communication 
is permitted. Alice and Bob prepare (n copies of) states ipAA' and ïpBB', respectively, and send halves of them down 
the channel (inputs A' n and B' n ). Then Bob performs the merging protocol, i.e. he makes the measurement on his 
typical subspace in blocks of size 2 nRB . As previously we label blocks (codes) by j. On average, he obtains a state 
close to a 2 nRs dimensional maximally entangled state shared with Charlie (who holds the system C), and Bob's part 
of the state ipABCR is merged with Charlie (tpABCR is purification of pabc)- Then, Alice shares with Charlie state 
Pabc where both part B and C is now with Charlie. Random measurement of Alice in blocks 2 nRA , will create a 
state close to the maximally entangled state of this dimension between Alice and Charlie, after Alice communicates 
her results to Charlie. In this way she also merges her part to Charlie, however it is not important in the present 
context. 

Let us now show, how Alice and Bob can share with Charlie maximally entangled state of suitable dimensions 
without classical communication. Namely, both Alice and Bob can perform thcir measurements before sending halves 
of their states ïp AA > and ïAbb' down the channel. They can then send the states Vaam V'bs' that they have obtained 
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FIG. 3: The rate region for the multiple-access channel for two parties with individual rates Ra and Rb- The total rate Rab 
is bounded by I(AB)C). The top left diagram shows the rate region when both rates are positive; the top right and bottom 
left diagrams show the case where I(B)C) < or I(A)C) < 0. I.e. here, Bob (Alice) can invest entanglement so that the other 
party can send at a rate I(A)BC) > Ra > I{A)C) (I(B)AC) > Rb > I(B)C)). In the bottom right diagram, both parties 
may have the option of achieving the higher rate by having the other party invest entanglement. 

(here ja and jb denote the outcomes of measurement). This still requires communication, as they have to teli Charlie, 
what outcomes they obtained. 

However, instead of measuring, they can prepare already ïp'AA'' ^bb 1 W1 th fixed ja and js known to Charlie. This 
will have the same effect as before, once they choose such labels, that guarantee that merging conditions are satisfied. 
Note that the states that Alice and Bob are now sending are close to maximally entangled states (this is guaranteed 
by the merging condition). The maximally entangled states to which they are close, defines the subspaces, which go 
through the channel, and allow correction of errors. The subspaces are codes that when used by Alice and Bob, allow 
them to obtain the above rates. Since our criterion was fidelity with the maximally entangled state, we have obtained 
here coding theorem with small average error. 

In our case it was relatively easy to go from one way to zero because the states that Alice and Bob obtain in our 
one-way protocol are close to maximally entangled states. For more complicated situations sec [36]. 

Finally, consider the case, where I(B)C) is negative, eq. (72). The reasoning is very similar: in the scenario with 
classical side-communication, Bob sends —I(B)C)+e halvcs of maximally entangled states through a noiseless channel, 
(keeping the other half), and performs merging, so that after that Alice can achieve her rate as above. However, again 
Alice and Bob instead of performing measurements, can send the state that would emerge under some outeome of 
the measurement. The difference is that Bob will send the state not only down the noisy channel, but also down the 
supplementary noiseless channel, and will share e rate of maximally entangled states (thus his overall rate is negative). 
This is the more interesting rate point: for Alice to achieve the rate I(A)BC), she requires Charlie to have C and B. 
Bob assists in providing this information (which can be understood as additional error correcting information from 
inside the channel) but that comes at a price, which is exactly —I(B)C). We thus have an interpretation of negative 
channel capacities. □ 

C. Converse coding theorem 

Here we briefly argue that (up to regularization) the rate region described by our conditions is optimal. The 
reasoning is quite Standard (see e.g. [5, 37]), therefore we will provide only a sketch of the proof. Suppose that some 
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rates Ra and Rb are achievable. Consider first the case where they are both positive. This means that Alice and 
Bob can send halves of singlets down the channels in such a way that after decoding by Charlie, they share with 
Charlie those singlets with fidelity tending asymptotically to one. Alice shares a singlet of dimcnsion 2 nRA with 
Charlie, and Bob one of dimension 2 nRB . Would they have exact singlets, the coherent informations would be equal 
to I(A)BC) = I(A)C) = nR A , I(B)C) = nR B and I(AB)C) = n(R A + R B ). Because they share inexact singlets, 
we apply asymptotic continuity of coherent information [37] (which plays here the role of Fano's incquality), thanks 
to which the coherent informations of the real state, per use of channel, approach the ideal vàlues in the asymptotic 
limit. This means that there exist such states, such that, if Alice and Bob will send halves of them down the channel, 
then after Charlie's decoding, the coherent informations approach the vàlues from the coding thcorem. 

There are still two issues. First, the states may be mixed: Alice and Bob prepared singlets, however the encoding 
procedure may turn them into mixed states. However, coherent information is convex, so that Alice and Bob will not 
do worse by sending some pure states. Second, we considered the joint ABC state after Charlie's decoding, while in 
the coding thcorem, we have state merging just from sending by Alice and Bob. However, due to the data processing 
inequality [5] (saying that opcrating on V one cannot increase I{U)V), the coherent information of the state before 
Charlie's decoding can be only greater. 

Let us now consider the case when one of the rates (suppose Rb ) is negative. This means that Bob uses the noiseless 
qubit channel an additional Rb times (per use of the noisy channel), and Alice achieves her rate. It suffices to show 
that, if rate pair (Ra, Rb) where R B is negative, is achievable, then Alice and Bob can create the joint state of ABC 
system, such that I(A)BC) = Ra and I(A)C) = Rb per use of channel. To this end, consider a new channel which 
consists of the old one supplcmcntcd by — R B + e uses of the noiseless channel from Bob to Charlie. For the new 
channel, the rates (Ra, e) are achievable. They are positive, so that, as explained above, there exist states of Alice and 
Bob, that sent down the channel produce a joint state having I(A)C) — Ra and I(B)C) = e. Suppose now that Bob 
will not send part the system that was intended to go through the noiseless channel, but keeps it. In this situation 
they only use the original channel. We will now see that they achieve the needed coherent informations in this way. 
Of course I(A)BC) = Ra, as this quantity does not depend on whether a given system is with Bob or with Charlie. 
Let us now estimate the quantity I(B)C). By sending — R B + e qubits, Bob could increase it up to e. However, by 
sending one qubit, one can increase coherent information no more than by one. Thus, coherent information I(B)C) 
cannot be smaller than Rb- This ends the proof of the converse theorem. 

IX. STRONG SUBADDITIVITY 

Using state merging, we can get a very quick and operationally intuitive proof of strong subadditivity [38] , which 
can be written as 

S(A\BC) < S(A\B). (73) 

Strong subadditivity is simply the observation that if Bob has access to an additional register C, then Alice surely 
doesn't need to send more partial information for him to get the full state pab- After all, Bob could always ignore the 
ancilla on C, but if he uses it, Alice may need to send him less. Mathematically, we can use this argument because in 
the proof that S(A\B) is the optimal merging rate we have used only typical subspaces and elementary probability 
for the direct part, and ordinary subadditivity in the converse part. 

X. CONCLUSION 

It is very interesting to compare the proof of the classical Slepian-Wolf theorem, with the proof of its quantum 
version - state merging. The Slepian-Wolf protocol is as follows: the typical sequences of Alice are divided into blocks 
of size « 2 nI ( A:B \ Note that this is the size of a good code. Now, when a particular sequence oceurs, Alice lets Bob 
know in which code is the sequence, and this is enough for him to determine her sequence. Thus the Slepian-Wolf 
theorem follows solely from the fact that a random code is a good code, which was shown by Shannon. 

Interestingly, our protocol is based on the same property, especially for states for which coherent information is 
positive. (This could be regarded as a situation analogous to the classical case, as the classical mutual information 
is always positive.) Namely, to prové quantum state merging it is enough to know that a random quantum code is a 
good quantum code. And in the quantum state merging protocol Alice performs an analogous task: she measures in 
which quantum code her state is, and telis Bob the result. 

What is now extremely surprising, is that those similarities turn out to be quite superficial. Namely, in the Slepian- 
Wolf protocol, the amount of bits needed to teli Bob the information "which code" is just the cost of transmission of 
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Alice's data to Bob. In the quantum case, the information "which code" , since represented by classical bits, is not 
counted at all, as we count only the quantum information. Thus in this case (positive coherent information) mcrging 
does not cost at all, unlikc in the classical case. What is more remarkable still is that despite this difference, the 
cost of sending partial quantum information is the conditional entropy, and thus formally similar to the classical case. 
This despite the fact that the classical case does not emerge as a limit from the quantum case. In other words, if 
one takes quantum state merging, and applies it to classical states (i.e. states which are fully decohered, and contain 
only classical correlations), thcn the goal is rather different, as one is attempting to retain entanglement between this 
classical state and the reference system, and one is further allowing free classical communication. 

We have two ways of interpreting the classical mutual information: (i) either as the quantity responsible for capacity 
or (ii) as the quantity that reports the part of information that is common both to Alice and Bob. Indeed, the latter 
meaning is implied by the fact that the cost of communication needed to transfer full information to Bob is H(X) 
(full information content of Alice's state) reduced by the amount of mutual information. Thus the latter represents 
that part of Alice's information, that Bob also knows, and it need not be transferred to him. 

It turns out that in the quantum case those two notions are no longer represented by the same quantity (see how- 
ever [39]). Namely, the communication cost is equal to Alice's information reduced by quantum mutual information. 
Thus quantum mutual information serves as common information. The capacity is on the other hand represented by 
the coherent information. The first quantity is sometimes greater than the whole of Alice's information, and precisely 
in those instances, the second quantity has the chance to be positive. 

It is indeed the beauty of the quantum information world, that both the quantities, into which the classical quantity 
has split, do their job in an analogous way as it was in the classical case. Indeed, the analogue of common information 
counts by how much the transmission cost is reduced - exactly as in the classical case, while the analogue of capacity 
is responsible for protocol, with the same bàsic elements as in classical case. The additional brick in the quantum 
protocol is teleportation, which is perhaps the thread that binds the two notions together. 

However, as we have noted, the analogy in the protocol is quite superficial. Even though Alice perform the 
operations that can be called by use of the same name (checking "which code" , and telling it to Bob) the meaning of 
those operations is completely different. It is extremely mysterious, how the quantum and classical cases can have so 
much in common, and at the same time can be so different. 
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APPENDIX A: MISCELLANEOUS FACTS ABOUT NORMS AND FIDELITY 

The following lemma relates the trace norm to the Hilbert-Schmídt norm. Recali that these norms are defined, for 
an operator X, as 

||A"||i -TrVXtX, (t race norm 

||X|| 2 := VTr xTx. (Hilbert-Schmidt norm) 

Lemma 13 For any operator X , 

\\X\\l<d\\X\\l (Al) 
where d is the dimension of the support of operator X (the subspace on which X has nonzero eigenvalues) . 
Proof. It is implied by convexity of function x 2 , where one takes probabilitics 1/d. □ 
The fidelíty of two states is given by 

F{p,<r)= (TryVWp) ■ (A2) 
Notice that if one of the states is pure, say o — \(())(<p\, then 

F(p, \<f>M) = {<P\ P \<P)=Tr(p\<t>M)- (A3) 
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Lemma 14 The fidelity is related to trace norm as follows [40]: 

1 - y/F(p, a) < ^\\p - <t||i < a). (A4) 

Lemma 15 (Gentle measurement) Let p be a (subnormalized) state, i.e. p>0 and Trp < 1, and letO < X < I. 
Then, ifTrpX>l-e, 

\\VXpVX-p\\ 1 <2^. (A5) 
Proof. See [41], Lemma 9; the better constant above is from [42]. □ 
Lemma 16 (Fannes [43]) For states p and a on a d-dimensional space, such that \\p — <j\\i < e, 



x — x log x if x < i , 
x+ ]2£ï ifx>\. 



\S{ P ) - S(a)\ < V (e)logd, wtth V (x) := { _ , loge 8 " f ' (A6) 



APPENDIX B: THE TWIRLING AVERAGE OF EQ. (24) 

We use the fact that an operator 

T{X) := ((UU ÀÀ )ïX(UU ÀÀ )) (Bl) 

is U (£> [7-invariant. However, the representation oíU ®U decomposes into the two irreducible components, the sym- 
metric and the antisymmetric subspace. By Schur's lemma, the only invariant operators are then linear combinations 
of the projections onto these subspaces: 

U AA = \ ( T ÀA + F AA ), nf| = \ (I ÀÀ -F ÀÀ ). (B2) 
Hcnce, the twirling map T can be written 

^X) = ^rnJfTr (XU^) + ^F^fTr . (B3) 

AA AA 

This is cnough to evaluate our average: 

((UU ÀÀ ÏF AlAl (UU ÀÀ )) = ^ (d 2 + 1) n|-Tr {F AlAl V%?) + JZ^flï ^ (F AlAl U^) 



A\ U A "AV u i 

2 jjsym L + L 2 jjanti L — L 



d À (d À + í) aa 2 d^(djï-l) AA 2 
LQL + 1) I ÀA + F ÀÀ L(L-l) I AÀ -F ÀÀ 
d A (d A + l) 2 d À (d À -l) 2 



APPENDIX C: TYPICALITY 



(B4) 



We shall need the concept and a few properties of typical subspaces [3]. Consider n copies of a density matrix p, 
p® n . Writing p in its eigenbasis, p = ^2iPi\i}(í\, we note first of all that S(p) = H(pi). Now, 

p® n = 5>»ii n x< n i, (ci) 

with 

i — i\ . . . i n , 

Pin =Pi 1 ·· -Pi n , (C2) 

l< n > = |ii> ■■■!<«>. 
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For S > 0, the set of typical sequences is defined as (see [44]) 

T s n := {i n :\- logft» - nS(p)\ < nS}, (C3) 

and the typical projector [3] is 

The typical projector inherits its properties from the set of typical sequences. Wc quote the following from [3], and 
from [30] for the exponential bounds (see also [44]): abbreviating II = Hg, 

Tr (p® n II) > 1 - exp(-c<5 2 n) with a constant c, (C5) 
np® n n < p® n , (C6) 

np 8n n < 2- n [ 5 (")- ò ']n, (C7) 
np® n n > 2-"[ 5 (") +,5 in, (C8) 

rank U = Tr ü < 2 n[s{p)+s] , (C9) 
rankl·l = TrlT > (l - e -cS 2 npn[S( P )-ó] 
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